6 Interactions
So far we’ve only created features from single variables but what about the effects of two variables together? Would it help our models if we added an interaction effect between for example HomePlanet and Destination to create the new feature HomeDestination? What about other such interactions?
In the previous competition for the Titanic of 1912, the sex of a passenger mattered (women more likely to survive) and the ticket class mattered (first class more likely to survive) but the interaction “woman in first class” had an almost 100% of survival and this interaction improved the model (if I remember correctly). We want to discover if such interactions exist between the variables that we have for our space odyssey.
Of course, we won’t know the extent of the improvement until we test the interaction effects in various models. Some models inherently discover interactions (like tree-models) and the addition of interaction effects might not matter while it might matter for others.
6.1 Visual exploration of interactions
Since we only have a few categorical variables, let’s visualize all possible interactions. Here I use the default glm logistic regression model where the formula:
Transported ~ (.)^2 that amounts to Outcome ~ Variable_1 + Variable_2 + Variable_1 x Variable_2
plot_simple_int <- function(df, v1, v2) {
tmp_vars <- c("Transported", v1, v2)
tmp_model <- glm(Transported ~ (.)^2, data = df[, tmp_vars], family = binomial())
p <- interactions::cat_plot(tmp_model, pred = {{ v1 }}, modx = {{ v2 }}, geom = "line", colors = c25,
main.title = paste(v1, "and", v2)) +
theme(text = element_text(size = 40), plot.title = element_text(size = 40))
return(p)
}
preds_cat <- c("CryoSleep", "HomePlanet", "Destination", "VIP", "Deck", "Side")
pairs_cat <- combn(preds_cat, 2, simplify = FALSE)
c25 <- c("dodgerblue2", "#E31A1C", "green4", "#6A3D9A", "#FF7F00", "black", "gold1", "skyblue2", "#FB9A99", "palegreen2", "#CAB2D6",
"#FDBF6F", "gray70", "khaki2", "maroon", "orchid1", "deeppink1", "blue1", "steelblue4", "darkturquoise", "green1", "yellow4",
"yellow3", "darkorange4", "brown") # Had to add extra colours because I couldn't get `cat_plot` to work with defaults
save_plot <- function(p, i) {
ggsave(filename = paste0("PairInt", i, ".png"), plot = p)
}
# cat_pair_int_plots <- pairs_cat %>%
# map(.x = ., .f = \(vp) plot_simple_int(train6, vp[1], vp[2]))
#
# cat_plots <- walk2(.x = cat_pair_int_plots, .y = seq_along(cat_pair_int_plots), .f = save_plot)
# cat_slick <- map(1:length(cat_pair_int_plots), .f = \(i) paste0("PairInt", i, ".png"))
# save(cat_slick, file = "Plots cat.RData")
load("Plots cat.RData")
slickR::slickR(cat_slick, height = "480px", width = "672px") +
slickR::settings(slidesToShow = 1, dots = TRUE)Figure 6.1: Interaction effects for pairs of categorical variables and against the response.
Parallel lines indicate no significant interaction effects while lines that cross indicate a potential for a significant interactions. I’ll highlight a few interactions below.
plot_simple_int(train6, "CryoSleep", "HomePlanet")
Figure 6.2: Interaction between CryoSleep and HomePlanet
The interaction between CryoSleep and HomePlanet suggests that passengers from Earth are less likely to be transported when in cryosleep which suggests that the interaction CryoSleep & HomePlanet could be useful.
plot_simple_int(train6, "Deck", "Side")
Figure 6.3: Interaction between Deck and Side
The interaction between Deck and Side, however, seems to show only a minor effect, if any.
Figures 6.4 and 6.5 below show all the interaction affects between both numerical and categoriacal variables.
plot_simple_int2 <- function(df, v1, v2) {
tmp_vars <- c("Transported", v1, v2)
tmp_model <- glm(Transported ~ (.)^2, data = df[, tmp_vars], family = binomial())
p <- interactions::interact_plot(tmp_model, pred = {{ v1 }}, modx = {{ v2 }}, geom = "line", colors = c25,
main.title = paste(v1, "and", v2)) +
scale_x_log10() +
theme(text = element_text(size = 40), plot.title = element_text(size = 40))
return(p)
}
preds_num <- c("Age", "RoomService", "FoodCourt", "ShoppingMall", "Spa", "VRDeck", "CabinNumber", "LastNameAsNumber",
"PassengerGroup", "CryoSleep", "HomePlanet", "Destination", "VIP", "Deck", "Side")
pairs_num <- combn(preds_num, 2, simplify = FALSE)
pairs_num <- pairs_num[1:90]
save_plot2 <- function(p, i) {
ggsave(filename = paste0("PairIntNum", i, ".png"), plot = p)
}
# num_pair_int_plots <- pairs_num %>%
# map(.x = ., .f = \(vp) plot_simple_int2(train6, vp[1], vp[2]))
#
# int_plots_slick <- walk2(.x = num_pair_int_plots, .y = seq_along(num_pair_int_plots), .f = save_plot2)
# int_plots_slick <- map(1:length(num_pair_int_plots), .f = \(i) paste0("PairIntNum", i, ".png"))
# save(int_plots_slick, file = "Plots num.RData")
load("Plots num.RData")
slickR::slickR(int_plots_slick[1:45], height = "480px", width = "672px") +
slickR::settings(slidesToShow = 1, dots = TRUE)Figure 6.4: Interaction effects for pairs of variables against the response. Part 1.
slickR::slickR(int_plots_slick[46:90], height = "480px", width = "672px") +
slickR::settings(slidesToShow = 1, dots = TRUE)